{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4facdf7f-680e-4d28-908b-2b8408e2a741",
   "metadata": {},
   "source": [
    "# How to call tools with multi-modal data\n",
    "\n",
    ":::info Prerequisites\n",
    "\n",
    "This guide assumes familiarity with the following concepts:\n",
    "\n",
    "- [Chat models](/docs/concepts/#chat-models)\n",
    "- [LangChain Tools](/docs/concepts/#tools)\n",
    "\n",
    ":::\n",
    "\n",
    "Here we demonstrate how to call tools with multi-modal data, such as images.\n",
    "\n",
    "Some multi-modal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#tool-calling) features as well.\n",
    "\n",
    "To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n",
    "\n",
    "Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { DynamicStructuredTool } from \"@langchain/core/tools\";\n",
    "import { z } from \"zod\";\n",
    "\n",
    "const imageUrl = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\";\n",
    "\n",
    "const weatherTool = new DynamicStructuredTool({\n",
    "  name: \"multiply\",\n",
    "  description: \"Describe the weather\",\n",
    "  schema: z.object({\n",
    "    weather: z.enum([\"sunny\", \"cloudy\", \"rainy\"])\n",
    "  }),\n",
    "  func: async ({ weather }) => {\n",
    "    console.log(weather);\n",
    "    return weather;\n",
    "  },\n",
    "});"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8656018e-c56d-47d2-b2be-71e87827f90a",
   "metadata": {},
   "source": [
    "## OpenAI\n",
    "\n",
    "For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "  {\n",
      "    name: \"multiply\",\n",
      "    args: { weather: \"sunny\" },\n",
      "    id: \"call_ZaBYUggmrTSuDjcuZpMVKpMR\"\n",
      "  }\n",
      "]\n"
     ]
    }
   ],
   "source": [
    "import { HumanMessage } from \"@langchain/core/messages\";\n",
    "import { ChatOpenAI } from \"@langchain/openai\";\n",
    "\n",
    "const model = new ChatOpenAI({\n",
    "  model: \"gpt-4o\",\n",
    "}).bindTools([weatherTool]);\n",
    "\n",
    "const message = new HumanMessage({\n",
    "  content: [\n",
    "    {\n",
    "      type: \"text\",\n",
    "      text: \"describe the weather in this image\"\n",
    "    },\n",
    "    {\n",
    "      type: \"image_url\",\n",
    "      image_url: {\n",
    "        url: imageUrl\n",
    "      }\n",
    "    }\n",
    "  ],\n",
    "});\n",
    "\n",
    "const response = await model.invoke([message]);\n",
    "\n",
    "console.log(response.tool_calls);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5738224-1109-4bf8-8976-ff1570dd1d46",
   "metadata": {},
   "source": [
    "Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cee63ff-e09f-4dd8-8323-912edbde94f6",
   "metadata": {},
   "source": [
    "## Anthropic\n",
    "\n",
    "For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d90c4590-71c8-42b1-99ff-03a9eca8082e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "  {\n",
      "    name: \"multiply\",\n",
      "    args: { weather: \"sunny\" },\n",
      "    id: \"toolu_01HLY1KmXZkKMn7Ar4ZtFuAM\"\n",
      "  }\n",
      "]\n"
     ]
    }
   ],
   "source": [
    "import * as fs from \"node:fs/promises\";\n",
    "\n",
    "import { ChatAnthropic } from \"@langchain/anthropic\";\n",
    "import { HumanMessage } from \"@langchain/core/messages\";\n",
    "\n",
    "const imageData = await fs.readFile(\"../../data/sunny_day.jpeg\");\n",
    "\n",
    "const model = new ChatAnthropic({\n",
    "  model: \"claude-3-sonnet-20240229\",\n",
    "}).bindTools([weatherTool]);\n",
    "\n",
    "const message = new HumanMessage({\n",
    "  content: [\n",
    "    {\n",
    "      type: \"text\",\n",
    "      text: \"describe the weather in this image\",\n",
    "    },\n",
    "    {\n",
    "      type: \"image_url\",\n",
    "      image_url: {\n",
    "        url: `data:image/jpeg;base64,${imageData.toString(\"base64\")}`,\n",
    "      },\n",
    "    },\n",
    "  ],\n",
    "});\n",
    "\n",
    "const response = await model.invoke([message]);\n",
    "\n",
    "console.log(response.tool_calls);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a66b7d2f",
   "metadata": {},
   "source": [
    "## Google Generative AI\n",
    "\n",
    "For Google GenAI, we can format a base64-encoded image into a content block of type \"image\", as below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f8184909",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ { name: 'multiply', args: { weather: 'sunny' } } ]\n"
     ]
    }
   ],
   "source": [
    "import { ChatGoogleGenerativeAI } from \"@langchain/google-genai\";\n",
    "import axios from \"axios\";\n",
    "import { ChatPromptTemplate, MessagesPlaceholder } from \"@langchain/core/prompts\";\n",
    "import { HumanMessage } from \"@langchain/core/messages\";\n",
    "\n",
    "const axiosRes = await axios.get(imageUrl, { responseType: \"arraybuffer\" });\n",
    "const base64 = btoa(\n",
    "  new Uint8Array(axiosRes.data).reduce(\n",
    "    (data, byte) => data + String.fromCharCode(byte),\n",
    "    ''\n",
    "  )\n",
    ");\n",
    "\n",
    "const model = new ChatGoogleGenerativeAI({ model: \"gemini-1.5-pro-latest\" }).bindTools([weatherTool]);\n",
    "\n",
    "const prompt = ChatPromptTemplate.fromMessages([\n",
    "  [\"system\", \"describe the weather in this image\"],\n",
    "  new MessagesPlaceholder(\"message\")\n",
    "]);\n",
    "\n",
    "const response = await prompt.pipe(model).invoke({\n",
    "  message: new HumanMessage({\n",
    "    content: [{\n",
    "      type: \"media\",\n",
    "      mimeType: \"image/jpeg\",\n",
    "      data: base64,\n",
    "    }]\n",
    "  })\n",
    "});\n",
    "console.log(response.tool_calls);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5dd4ef4",
   "metadata": {},
   "source": [
    "### Audio input\n",
    "\n",
    "Google's Gemini also supports audio inputs. In this next example we'll see how we can pass an audio file to the model, and get back a summary in structured format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c04c883e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "  {\n",
      "    name: 'summary_tool',\n",
      "    args: { summary: 'The video shows a person clapping their hands.' }\n",
      "  }\n",
      "]\n"
     ]
    }
   ],
   "source": [
    "import { SystemMessage } from \"@langchain/core/messages\";\n",
    "import { StructuredTool } from \"@langchain/core/tools\";\n",
    "\n",
    "class SummaryTool extends StructuredTool {\n",
    "  schema = z.object({\n",
    "      summary: z.string().describe(\"The summary of the content to log\")\n",
    "  })\n",
    "\n",
    "  description = \"Log the summary of the content\"\n",
    "\n",
    "  name = \"summary_tool\"\n",
    "\n",
    "  async _call(input: z.infer<typeof this.schema>) {\n",
    "      return input.summary\n",
    "  }\n",
    "}\n",
    "const summaryTool = new SummaryTool()\n",
    "\n",
    "const audioUrl = \"https://www.pacdv.com/sounds/people_sound_effects/applause-1.wav\";\n",
    "\n",
    "const axiosRes = await axios.get(audioUrl, { responseType: \"arraybuffer\" });\n",
    "const base64 = btoa(\n",
    "  new Uint8Array(axiosRes.data).reduce(\n",
    "    (data, byte) => data + String.fromCharCode(byte),\n",
    "    ''\n",
    "  )\n",
    ");\n",
    "\n",
    "const model = new ChatGoogleGenerativeAI({ model: \"gemini-1.5-pro-latest\" }).bindTools([summaryTool]);\n",
    "\n",
    "const response = await model.invoke([\n",
    "  new SystemMessage(\"Summarize this content. always use the summary_tool in your response\"),\n",
    "  new HumanMessage({\n",
    "  content: [{\n",
    "    type: \"media\",\n",
    "    mimeType: \"audio/wav\",\n",
    "    data: base64,\n",
    "  }]\n",
    "})]);\n",
    "\n",
    "console.log(response.tool_calls);"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "TypeScript",
   "language": "typescript",
   "name": "tslab"
  },
  "language_info": {
   "codemirror_mode": {
    "mode": "typescript",
    "name": "javascript",
    "typescript": true
   },
   "file_extension": ".ts",
   "mimetype": "text/typescript",
   "name": "typescript",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
